Take-Home Exercise 1: Application of Spatial Point Patterns Analysis to discover geographical distribution of functional & non-functional water points in Osub State, Nigeria

Author

Wong Kelly

Modified

February 12, 2023

1. Overview

Water is an important resource to mankind. Clean and accessible water is critical to human health. It provides a healthy environment, a sustainable economy, reduces poverty and ensures peace and security. Yet over 40% of the global population does not have access to sufficient clean water. By 2025, 1.8 billion people will be living in countries or regions with absolute water scarcity, according to UN-Water. The lack of water poses a major threat to several sectors, including food security. Agriculture uses about 70% of the world’s accessible freshwater.

Developing countries are most affected by water shortages and poor water quality. Up to 80% of illnesses in the developing world are linked to inadequate water and sanitation. Despite technological advancement, providing clean water to the rural community is still a major development issues in many countries globally, especially countries in the Africa continent.

To address the issue of providing clean and sustainable water supply to the rural community, a global Water Point Data Exchange (WPdx) project has been initiated. The main aim of this initiative is to collect water point related data from rural areas at the water point or small water scheme level and share the data via WPdx Data Repository, a cloud-based data library. What is so special of this project is that data are collected based on WPDx Data Standard.

1.0 Objectives

Geospatial analytics hold tremendous potential to address complex problems facing society. In this study, you are tasked to apply appropriate spatial point patterns analysis methods to discover the geographical distribution of functional and non-function water points and their co-locations if any in Osun State, Nigeria.

1.1 Data Acquisition

Apstial data

For the purpose of this assignment, data from WPdx Global Data Repositories will be used. There are two versions of the data. They are: WPdx-Basic and WPdx+. You are required to use WPdx+ data set.

Geospatial data

This study will focus of Osun State, Nigeria. The state boundary GIS data of Nigeria can be downloaded either from The Humanitarian Data Exchange portal or geoBoundaries.

2. Getting started

2.1 Installing and Loading the R packages

For take-home assignment 1, we will need to install the following packages:

pacman::p_load(sf, funModeling,maptools,raster, spatstat, tmap ,  tidyverse, sfdep)

The code chunk is to check that all the required packages are installed if not, install them.

if (!require(sf)) {
install.packages("sf")
}
if (!require(funModeling)) {
install.packages("funModeling")
}
if (!require(maptools)) {
install.packages("maptools")
}
if (!require(raster)) {
install.packages("raster")
}
if (!require(spatstat)) {
install.packages("spatstat")
}
if (!require(tmap)) {
install.packages("tmap")
}
if (!require(tidyverse)) {
install.packages("tidyverse")
}

3. Data Wrangling: Geospatial Data & Aspatial Data

3.1 Importing geoBoundaries Data into R

In this section of 3.1, st_read() of sf package will be used to import geospatial geoboundaries-NGA data set into R.

geoNGA <- st_read("data/geospatial/",
                  layer = "geoBoundaries-NGA-ADM2")
Reading layer `geoBoundaries-NGA-ADM2' from data source 
  `C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex01\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 774 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS:  WGS 84

From the output message, we learn that:

  • Geometry type of geoBoundaries dataset is multipolygon

  • 774 features and 5 fields

  • Assigned CRS is WGS 84 (geographic coordinate system)

  • Dimension is XY

3.2 Importing Geospatial NGA Data into R

In this section of 3.2, st_read() of sf package will be used to import geospatial NGA dataset into R.

We filter data to only Osun state as that is what we are interested in finding for this assignment!

NGA <- st_read("data/geospatial/",
               layer = "nga_admbnda_adm2_osgof_20190417") %>%
  filter(ADM1_EN == "Osun") %>% 
  st_transform(crs = 26392)
Reading layer `nga_admbnda_adm2_osgof_20190417' from data source 
  `C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex01\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 774 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS:  WGS 84

From the output message, we learn that:

  • Geometry type of NGA dataset is multipolygon

  • 774 features and 16 fields

  • Assigned CRS is WGS 84 (geographic coordinate system)

  • Dimension is XY

In geospatial analytics, we need to transform the original data that is in geographic coordinate system (WGS) to projected coordinate system. This is because geographic coordinate system is not appropriate if the analysis need to use distance and/or area measurements.

Therefore, we need to transform NGA dataset to projected coordinate system by using st_transform() in sf package. (will be further elaborate in section 3.3.1 and 3.3.2)

By examining both sf dataframe closely, we notice that NGA provide both LGA and state information.

Hence, NGA data.frame will be used for the subsequent processing.

3.3 Importing Aspatial Data into R

In this section of 3.3, read_csv() will be used to import asptial data set into R and we filter out only Nigeria, Osun data rows as those are what we interested in analysing for this project.

wp_nga <- read_csv("data/aspatial/WPdx.csv") %>%
  filter(`#clean_adm1` == "Osun") %>%
  filter(`#clean_country_name` == "Nigeria")

3.3.1 Converting Water Point Data into SF Point Features

Step 1: Convert the wkt field into sfc field by using st_as_sfc() data type.

wp_nga$Geometry = st_as_sfc(wp_nga$`New Georeferenced Column`)
wp_nga
# A tibble: 5,745 × 75
   row_id `#source`      #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
    <dbl> <chr>            <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
 1 225950 Federal Minis…    7.43    4.26 05/05/… Yes     Boreho… Well    Hand P…
 2 225524 Federal Minis…    7.78    4.56 04/22/… Yes     Protec… Well    Hand P…
 3 197014 Federal Minis…    7.49    4.53 04/30/… Yes     Boreho… Well    Mechan…
 4 225173 Federal Minis…    7.93    4.73 05/02/… Yes     Boreho… Well    Hand P…
 5 225843 Federal Minis…    7.74    4.44 05/08/… Yes     Boreho… Well    Hand P…
 6 235508 Federal Minis…    7.15    4.64 04/27/… Yes     Protec… Well    Hand P…
 7 197708 Federal Minis…    7.87    4.72 05/13/… Yes     Boreho… Well    Mechan…
 8 195041 Federal Minis…    7.73    4.45 06/17/… Yes     Protec… Spring  <NA>   
 9 225222 Federal Minis…    7.81    4.15 05/14/… Yes     Protec… Spring  Mechan…
10 460770 GRID3             7.4     4.33 06/13/… Unknown Boreho… Well    <NA>   
# … with 5,735 more rows, 66 more variables: `#water_tech_category` <chr>,
#   `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
#   `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
#   `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
#   `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
#   `#pay` <chr>, `#fecal_coliform_presence` <chr>,
#   `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …

Step 2: Convert the tibble data.frame into an sf object by using st_sf(). It is also important for us to include the referencing system of the data into the sf object.

wp_sf <- st_sf(wp_nga, crs=4326)
wp_sf
Simple feature collection with 5745 features and 74 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4.032004 ymin: 7.060309 xmax: 5.06 ymax: 8.061898
Geodetic CRS:  WGS 84
# A tibble: 5,745 × 75
   row_id `#source`      #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
 *  <dbl> <chr>            <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
 1 225950 Federal Minis…    7.43    4.26 05/05/… Yes     Boreho… Well    Hand P…
 2 225524 Federal Minis…    7.78    4.56 04/22/… Yes     Protec… Well    Hand P…
 3 197014 Federal Minis…    7.49    4.53 04/30/… Yes     Boreho… Well    Mechan…
 4 225173 Federal Minis…    7.93    4.73 05/02/… Yes     Boreho… Well    Hand P…
 5 225843 Federal Minis…    7.74    4.44 05/08/… Yes     Boreho… Well    Hand P…
 6 235508 Federal Minis…    7.15    4.64 04/27/… Yes     Protec… Well    Hand P…
 7 197708 Federal Minis…    7.87    4.72 05/13/… Yes     Boreho… Well    Mechan…
 8 195041 Federal Minis…    7.73    4.45 06/17/… Yes     Protec… Spring  <NA>   
 9 225222 Federal Minis…    7.81    4.15 05/14/… Yes     Protec… Spring  Mechan…
10 460770 GRID3             7.4     4.33 06/13/… Unknown Boreho… Well    <NA>   
# … with 5,735 more rows, 66 more variables: `#water_tech_category` <chr>,
#   `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
#   `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
#   `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
#   `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
#   `#pay` <chr>, `#fecal_coliform_presence` <chr>,
#   `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …

3.3.2 Transforming into Nigeria Projected Coordinate System

wp_sf <- wp_sf %>%
  st_transform(crs = 26392)
wp_sf
Simple feature collection with 5745 features and 74 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 177285.9 ymin: 340054.1 xmax: 291287.1 ymax: 450859.7
Projected CRS: Minna / Nigeria Mid Belt
# A tibble: 5,745 × 75
   row_id `#source`      #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
 *  <dbl> <chr>            <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
 1 225950 Federal Minis…    7.43    4.26 05/05/… Yes     Boreho… Well    Hand P…
 2 225524 Federal Minis…    7.78    4.56 04/22/… Yes     Protec… Well    Hand P…
 3 197014 Federal Minis…    7.49    4.53 04/30/… Yes     Boreho… Well    Mechan…
 4 225173 Federal Minis…    7.93    4.73 05/02/… Yes     Boreho… Well    Hand P…
 5 225843 Federal Minis…    7.74    4.44 05/08/… Yes     Boreho… Well    Hand P…
 6 235508 Federal Minis…    7.15    4.64 04/27/… Yes     Protec… Well    Hand P…
 7 197708 Federal Minis…    7.87    4.72 05/13/… Yes     Boreho… Well    Mechan…
 8 195041 Federal Minis…    7.73    4.45 06/17/… Yes     Protec… Spring  <NA>   
 9 225222 Federal Minis…    7.81    4.15 05/14/… Yes     Protec… Spring  Mechan…
10 460770 GRID3             7.4     4.33 06/13/… Unknown Boreho… Well    <NA>   
# … with 5,735 more rows, 66 more variables: `#water_tech_category` <chr>,
#   `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
#   `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
#   `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
#   `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
#   `#pay` <chr>, `#fecal_coliform_presence` <chr>,
#   `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …

From the output message, we learn that:

  • Geometry type of NGA dataset is now point

  • 5745 features and 74 fields

  • Projected CRS: Minna/Nigeria Mid Belt

  • Dimension: XY

We have successfully transformed the data!! :D

4. Data Pre-Processing

Before we can visualise our dataset and do the necessary analysis, we have to do data cleaning which is an important step in any data science task including geospatial data science. Things to check in the dataset:

  • Invalid geometries

  • Exclude redundancy

  • Missing value

  • Duplicate name

4.1 Check for Invalid Geometries

length(which(st_is_valid(NGA) == FALSE))
[1] 0
length(which(st_is_valid(wp_sf) == FALSE))
[1] 0

From the above generated output message, there are no invalid geometries! Great!

4.2 Exclude Redundancy

NGA <- NGA %>%
  select(c(3:4, 8:9))

4.3 Check for Missing Value

NGA[rowSums(is.na(NGA))!=0,]
Simple feature collection with 0 features and 4 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: Minna / Nigeria Mid Belt
[1] ADM2_EN    ADM2_PCODE ADM1_EN    ADM1_PCODE geometry  
<0 rows> (or 0-length row.names)

The printout shows that there is zero missing value in the dataset!

4.4 Check for Duplicate Name

NGA$ADM2_EN[duplicated(NGA$ADM2_EN)==TRUE]
character(0)

Great!

Now, we are ready to analyse the dataset!

5. Data Wrangling for Water Point Data

Exploratory Data Analysis (EDA) is a popular approach to gain initial understanding of the data.

Firstly, we take a look at all the column names in wp_sf dataset and identify the column we need to plot status_clean frequency bar chart.

colnames(wp_sf)
 [1] "row_id"                      "#source"                    
 [3] "#lat_deg"                    "#lon_deg"                   
 [5] "#report_date"                "#status_id"                 
 [7] "#water_source_clean"         "#water_source_category"     
 [9] "#water_tech_clean"           "#water_tech_category"       
[11] "#facility_type"              "#clean_country_name"        
[13] "#clean_adm1"                 "#clean_adm2"                
[15] "#clean_adm3"                 "#clean_adm4"                
[17] "#install_year"               "#installer"                 
[19] "#rehab_year"                 "#rehabilitator"             
[21] "#management_clean"           "#status_clean"              
[23] "#pay"                        "#fecal_coliform_presence"   
[25] "#fecal_coliform_value"       "#subjective_quality"        
[27] "#activity_id"                "#scheme_id"                 
[29] "#wpdx_id"                    "#notes"                     
[31] "#orig_lnk"                   "#photo_lnk"                 
[33] "#country_id"                 "#data_lnk"                  
[35] "#distance_to_primary_road"   "#distance_to_secondary_road"
[37] "#distance_to_tertiary_road"  "#distance_to_city"          
[39] "#distance_to_town"           "water_point_history"        
[41] "rehab_priority"              "water_point_population"     
[43] "local_population_1km"        "crucialness_score"          
[45] "pressure_score"              "usage_capacity"             
[47] "is_urban"                    "days_since_report"          
[49] "staleness_score"             "latest_record"              
[51] "location_id"                 "cluster_size"               
[53] "#clean_country_id"           "#country_name"              
[55] "#water_source"               "#water_tech"                
[57] "#status"                     "#adm2"                      
[59] "#adm3"                       "#management"                
[61] "#adm1"                       "New Georeferenced Column"   
[63] "lat_deg_original"            "lat_lon_deg"                
[65] "lon_deg_original"            "public_data_source"         
[67] "converted"                   "count"                      
[69] "created_timestamp"           "updated_timestamp"          
[71] "#pay_clean"                  "#subjective_quality_clean"  
[73] "is_duplicate"                "dataset_title"              
[75] "Geometry"                   

Now, once we have the column name “#status_clean”, we use the “table” function to get the frequency of unique values in a vector.

This will return a frequency table with unique values in “wp_sf$‘#status_clean’ and their corresponding frequency. The”sort” function will sort the table based on the frequency.

sort(table(wp_sf$"#status_clean"), decreasing = TRUE)

               Functional            Non-Functional  Functional, needs repair 
                     2406                      2086                       259 
      Non-Functional, dry    Functional, not in use  Abandoned/Decommissioned 
                      159                        64                        15 
Functional but not in use 
                        8 

To plot a bar chart based on the frequency table, use the “barplot” function in R.

The below code will create a bar plot of the frequency table, with the x-axis labeled “status_clean” and the y-axis labeled “Frequency”. The main title of the plot will be “Bar Plot of status_clean”.

#Set the colour scheme for the bar
colors <- c("grey","pink","purple","blue","green","yellow")

#Plot the frequency of status_clean in bar chart
freq_table <- sort(table(wp_sf$"#status_clean"), decreasing = TRUE)
barplot(freq_table, xlab = "status_clean", ylab = "Frequency", main = "Bar Plot of status_clean", col = colors)

The below code chunk will include percentage labels on the bar plot!

# calculate the percentage of each status_clean value 
freq_table_pct <- round(100* prop.table(freq_table),2)

# plot the bar plot with the percentage labels 
barplot(freq_table, xlab = "status_clean", ylab = "Frequency", main = "Bar Plot of status_clean (Percentage)", col = colors, las = 2)
text(x = 1:length(freq_table), y = freq_table + 0.5, labels = paste(freq_table_pct, "%"), pos = 3, cex = 0.7)

Next, code chunk below will be used to perform the following data wrangling tasksP - rename() of dplyr package is used to rename the column from #status_clean to status_clean for easier handling in subsequent steps. mutate() and replace_na() are used to recode all the NA values in status_clean into unknown.

wp_sf_nga <- wp_sf %>% 
  rename(status_clean = '#status_clean') %>%
  select(status_clean) %>%
  mutate(status_clean = replace_na(
    status_clean, "unknown"))

5.1 Extracting Water Point Data

Now we are ready to extract the water point data according to their status.

The code chunk below is used to extract functional water point.

wp_functional <- wp_sf_nga %>%
  filter(status_clean %in%
           c("Functional",
             "Functional but not in use",
             "Functional but needs repair"))

The code chunk below is used to extract nonfunctional water point.

wp_nonfunctional <- wp_sf_nga %>%
  filter(status_clean %in%
           c("Abandoned/Decommissioned",
             "Abandoned",
             "Non-Functional due to dry season",
             "Non-Functional",
             "Non functional due to dry season"))

The code chunk below is used to extract water point with unknown status.

wp_unknown <- wp_sf_nga %>%
  filter(status_clean == "unknown")

5.2 Performing Point-in-Polygon Count

Next, we want to find out the number of total, functional, nonfunctional and unknown water points in each LGA. This is performed in the following code chunk.

First, it identifies the functional water points in each LGA by using st_intersects() of sf package. Next, length() is used to calculate the number of functional water points that fall inside each LGA.

NGA_wp <- NGA %>% 
  mutate(`total_wp` = lengths(
    st_intersects(NGA, wp_sf_nga))) %>%
  mutate(`wp_functional` = lengths(
    st_intersects(NGA, wp_functional))) %>%
  mutate(`wp_nonfunctional` = lengths(
    st_intersects(NGA, wp_nonfunctional))) %>%
  mutate(`wp_unknown` = lengths(
    st_intersects(NGA, wp_unknown)))

Notice that four new derived fields have been added into NGA_wp sf data.frame.

We can visualise the summary of NGA_wp sf dataframe in statistics forms such as mean, median, and max etc for both functional and nonfunctional by using summary() as shown in the code chunk below:

summary(NGA_wp)
   ADM2_EN           ADM2_PCODE          ADM1_EN           ADM1_PCODE       
 Length:30          Length:30          Length:30          Length:30         
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
          geometry     total_wp     wp_functional    wp_nonfunctional
 MULTIPOLYGON :30   Min.   : 60.0   Min.   : 14.00   Min.   : 17.0   
 epsg:26392   : 0   1st Qu.:134.0   1st Qu.: 50.50   1st Qu.: 46.5   
 +proj=tmer...: 0   Median :166.5   Median : 75.00   Median : 58.0   
                    Mean   :182.9   Mean   : 77.20   Mean   : 65.9   
                    3rd Qu.:212.0   3rd Qu.: 91.75   3rd Qu.: 86.5   
                    Max.   :443.0   Max.   :249.00   Max.   :140.0   
   wp_unknown   
 Min.   : 4.00  
 1st Qu.:13.00  
 Median :22.00  
 Mean   :24.47  
 3rd Qu.:32.75  
 Max.   :78.00  

5.3 Visualising attributes by using statistical graphs

ggplot(data = NGA_wp,
       aes(x = total_wp)) + 
  geom_histogram(bins=20,
                 color="black",
                 fill="light blue") +
  geom_vline(aes(xintercept=mean(
    total_wp, na.rm=T)),
             color="red", 
             linetype="dashed", 
             size=0.8) +
  ggtitle("Distribution of total water points by LGA") +
  xlab("No. of water points") +
  ylab("No. of\nLGAs") +
  theme(axis.title.y=element_text(angle = 0))

5.3.1 Observation from Statistical graph of NGA waterpoints

  • The histogram is a right-skewed distribution where the long tail extends to the right whole most values cluster on the left, as shown above in section 5.3.

  • There are a few possible outliers in the above histogram and we can use the 1.5 interquartile range (IQR) criterion to check whether they can be considered as outliers.

    • Q1 = 134 and Q3 = 212, which give an IQR = Q3-Q1 = 78

    • Q1 - 1.5(IQR) = 134 - (1.5)(78) = 17

    • Q3 + 1.5(IQR) = 212 + (1.5)(78) = 329

  • The 1.5(IQR) criterion tells us that any observation with water points that is below 17 or above 329 is considered a suspected outlier.

  • With that, we can conclude that there are three outliers in this histogram and those with an arrow above are the ones.

5.4 Saving the analytical data in rds format

write_rds(NGA_wp, "data/rds/NGA_wp.rds")

6. Geospatial Mapping

6.1 Basic Choropleth Mapping

In this section, will be plotting different choropleth maps to analyse the distribution of water point in Nigeria, Osun state.

tmap_mode("plot")
qtm(NGA_wp, 
    fill = c("wp_functional","wp_nonfunctional", "wp_unknown"))

p1 <- tm_shape(NGA_wp) +
  tm_fill("wp_functional",
          n = 10,
          style = "equal",
          palette = "Blues") +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Distribution of functional water point by LGAs",
            legend.outside = FALSE)
p2 <- tm_shape(NGA_wp) +
  tm_fill("total_wp",
          n = 10,
          style = "equal",
          palette = "Blues") +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Distribution of total  water point by LGAs",
            legend.outside = FALSE)
tmap_arrange(p2, p1, nrow = 1)

6.1.1 Observation of the Distribution of Water Points in Nigeria, Osun state.

From the map generated earlier in section 6.1 that shows the distribution of water points in Nigeria, Osun state according to their functionality. We observed a few things:

  • At first glance, non-functional water map has a higher intensity based on its colours spread compared to the other two maps on its right and left. That could means that non-functional water points are more widely spread in Nigeria, Osun state.

  • On the other hand, generally, functional water map has a least colours intensity spread compared to the other two maps. That could means that functional water points are less common/least spread in Nigeria, Osun state.

In conclusion, Nigeria, Osun state has a higher level of non-functional water than functional water.

6.2 Choropleth Map for Rates: Deriving Proportion of Functional Water Points and Non-Functional Water Points

We will tabulate the proportion of functional water points and the proportion of non-functional water points in each LGA. In the following code chunk, mutate() from dplyr package is used to derive two fields, namely pct_functional and pct_nonfunctional.

NGA_wp <- NGA_wp %>%
  mutate(pct_functional = wp_functional/total_wp) %>%
  mutate(pct_nonfunctional = wp_nonfunctional/total_wp)
tm_shape(NGA_wp) +
  tm_fill("wp_functional",
          n = 10,
          style = "equal",
          palette = "Blues",
          title = "Dependency ratio",
          legend.hist = TRUE) +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Rate map of functional water point by LGAs",
            legend.outside = TRUE)

6.3 Extreme Value Maps: Percentile Map

Extreme value maps are variations of common choropleth maps where the classification is designed to highlight extreme values at the lower and upper end of the scale, with the goal of identifying outliers. These maps were developed in the spirit of spatializing EDA, i.e., adding spatial features to commonly used approaches in non-spatial EDA (Anselin 1994).

The percentile map is a special type of quantile map with six specific categories: 0-1%,1-10%, 10-50%,50-90%,90-99%, and 99-100%. The corresponding breakpoints can be derived by means of the base R quantile command, passing an explicit vector of cumulative probabilities as c(0,.01,.1,.5,.9,.99,1). Note that the begin and endpoint need to be included.

Step 1: Exclude records with NA by using the code chunk below

NGA_wp <- NGA_wp %>%
  drop_na()

Step 2: Creating customised classification and extracting values

percent <- c(0,.01,.1,.5,.9,.99,1)
var <- NGA_wp["pct_functional"] %>%
  st_set_geometry(NULL)
quantile(var[,1], percent)
       0%        1%       10%       50%       90%       99%      100% 
0.2179487 0.2224103 0.2900820 0.4145480 0.5076962 0.6203446 0.6441441 

Step 3: Creating the get.var function

We will write an R function as shown below to extract a variable (i.e. wp_nonfunctional) as a vector out of an sf data.frame.

  • arguments:

    • vname: variable name (as character, in quotes)

    • df: name of sf data frame

  • returns:

    • v: vector with values (without a column name)
get.var <- function(vname,df) {
  v <- df[vname] %>% 
    st_set_geometry(NULL)
  v <- unname(v[,1])
  return(v)
}

Step 4: we will write a percentile mapping function by using the code chunk below.

percentmap <- function(vnam, df, legtitle=NA, mtitle="Percentile Map"){
  percent <- c(0,.01,.1,.5,.9,.99,1)
  var <- get.var(vnam, df)
  bperc <- quantile(var, percent)
  tm_shape(df) +
  tm_polygons() +
  tm_shape(df) +
     tm_fill(vnam,
             title=legtitle,
             breaks=bperc,
             palette="Blues",
          labels=c("< 1%", "1% - 10%", "10% - 50%", "50% - 90%", "90% - 99%", "> 99%"))  +
  tm_borders() +
  tm_layout(main.title = mtitle, 
            title.position = c("right","bottom"))
}

Step 5: Test drive the percentile mapping function

percentmap("total_wp", NGA_wp)

7. First-Order Spatial Point Patterns Analysis Methods

Visualising the sf layers

It is always a good practice to plot the output sf layers on OSM layer to ensure that they have been imported properly and been projected on an appropriate projection system.

tmap_mode("view")
tm_shape(wp_functional) +
  tm_dots(alph = 0.5, 
          size=0.01,
          border.col = "blue",
          border.lwd = 0.5) +
  tm_shape(wp_nonfunctional) +
  tm_dots(alph = 0.5, 
          size=0.01,
          border.col = "yellow",
          border.lwd = 0.5) +
  tm_view(set.zoom.limits = c(8,12))

7.1 Converting SF Data Frames to SP’s Spatial Class

The code chunk below uses as_Spatial() of sf package to convert the geospatial data from simple data feature data frame to sp’s Spatial* class.

nga_sp <- as_Spatial(wp_sf_nga)
nga_sp
class       : SpatialPointsDataFrame 
features    : 5745 
extent      : 177285.9, 291287.1, 340054.1, 450859.7  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 1
names       :             status_clean 
min values  : Abandoned/Decommissioned 
max values  :                  unknown 
nga_functional_sp <- as_Spatial(wp_functional)
nga_functional_sp
class       : SpatialPointsDataFrame 
features    : 2414 
extent      : 177285.9, 290751, 343128.1, 450859.7  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 1
names       :              status_clean 
min values  :                Functional 
max values  : Functional but not in use 
nga_nonfunctional_sp <- as_Spatial(wp_nonfunctional)
nga_nonfunctional_sp
class       : SpatialPointsDataFrame 
features    : 2101 
extent      : 180539, 290546.5, 340054.1, 450780.1  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 1
names       :             status_clean 
min values  : Abandoned/Decommissioned 
max values  :           Non-Functional 

Notice from the output message that the geospatial data wp_sf_nga, wp_functional, and wp_nonfunctional have all been converted to sp’s spatial* class now.

7.2 Converting the Spatial* Class into Generic SP Format

Spstat requires the analytical data in ppp object form. There is no direct way to convert a Spatial* classes into ppp object. We need to convert the Spatial classes* into Spatial object first.

The code chunk below converts the Spatial* class of NGA into generic sp object.

nga_sp <- as(nga_sp, "SpatialPoints")
nga_sp
class       : SpatialPoints 
features    : 5745 
extent      : 177285.9, 291287.1, 340054.1, 450859.7  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
nga_functional_sp <- as(nga_functional_sp, "SpatialPoints")
nga_functional_sp
class       : SpatialPoints 
features    : 2414 
extent      : 177285.9, 290751, 343128.1, 450859.7  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
nga_nonfunctional_sp <- as(nga_nonfunctional_sp, "SpatialPoints")
nga_nonfunctional_sp
class       : SpatialPoints 
features    : 2101 
extent      : 180539, 290546.5, 340054.1, 450780.1  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 

7.3 Converting the Generic SP Format into Spatstat’s ppp Format

Now, we will use as.ppp() function of spatstat to convert the spatial data into spatstat’s ppp object format.

nga_ppp <- as(nga_sp, "ppp")
nga_ppp
Planar point pattern: 5745 points
window: rectangle = [177285.9, 291287.05] x [340054.1, 450859.7] units
nga_functional_ppp <- as(nga_functional_sp, "ppp")
nga_functional_ppp
Planar point pattern: 2414 points
window: rectangle = [177285.9, 290750.96] x [343128.1, 450859.7] units
nga_nonfunctional_ppp <- as(nga_nonfunctional_sp, "ppp")
nga_nonfunctional_ppp
Planar point pattern: 2101 points
window: rectangle = [180538.96, 290546.54] x [340054.1, 450780.1] units

Let us then plot nga_ppp and examine the different.

plot(nga_ppp)

plot(nga_functional_ppp)

plot(nga_nonfunctional_ppp)

We can also look at the summary statistics of the newly created ppp object by using the code chunk below.

summary(nga_ppp)
Planar point pattern:  5745 points
Average intensity 4.547987e-07 points per square unit

Coordinates are given to 2 decimal places
i.e. rounded to the nearest multiple of 0.01 units

Window: rectangle = [177285.9, 291287.05] x [340054.1, 450859.7] units
                    (114000 x 110800 units)
Window area = 1.2632e+10 square units

There are no warning message about duplicates but let’s do a double check before moving on :)

any(duplicated(nga_ppp))
[1] FALSE

From the generated output, we can confidently say that there is no duplication!

7.4 Creating owin object

When analysing spatial point patterns, it is a good practice to confine the analysis with a geographical area like Nigeria boundary. In spatstat, an object called owin is specially designed to represent this polygonal region.

The code chunk below is used to convert nga SpatialPolygon object into owin object of spatstat.

ONLY NIGERIA - OSUN STATE**

library(sf)
library(spatstat)

# Load the spatial point data
nga_spp <- st_as_sf(NGA, coords = c("x", "y"))

# Convert the "SpatialPoints" object to a polygon format using st_cast()
nga_poly <- st_cast(nga_spp, "POLYGON")

# Extract the polygon geometry component of the "sf" object
nga_geom <- st_geometry(nga_poly)

# Convert the polygon geometry component to a list of polygon objects
nga_list_poly <- as.list(nga_geom)

# Create an "owin" object from the list of polyggon objects
nga_owin <- as.owin(nga_spp)

# Analyze the point pattern using spatstat functions
plot(nga_owin)

The code chunk below is used to convert nga_geometry spatialpolygon object into owin object of spatstat.

ONLY NIGERIA**

library(sf)
library(spatstat)

# Load the spatial point data
geometry_spp <- st_as_sf(geoNGA, coords = c("x", "y"))

# Convert the "SpatialPoints" object to a polygon format using st_cast()
nga_poly <- st_cast(geometry_spp, "POLYGON")

# Extract the polygon geometry component of the "sf" object
geom <- st_geometry(nga_poly)

# Convert the polygon geometry component to a list of polygon objects
nga_list_poly <- as.list(geom)

#converting the spatial point into a projected coordinate system using the st_transform from the 'sf' package.
nga_poly_proj <- st_transform(nga_poly, crs = "+proj=utm +zone=30 +ellps=WGS84")

# Create an "owin" object from the list of polyggon objects
nga_geometry_owin <- as.owin(nga_poly_proj)

# Analyze the point pattern using spatstat functions
plot(nga_geometry_owin)

nga_ppp = nga_ppp[nga_owin]
plot(nga_ppp)

7.5 Kernel Density Estimation

In this section, we will learn how to compute the kernel density estimation (KDE). Some definitions:

  • Density: The amount of features or events per unit area

  • Density estimation: The construction of the density function from the observed data

  • Kernel: A window function fitted on each observation (weighted or unweighted) to determine the fraction of the observation used for density estimation at any location within the window

The code chunk below computes a kernel density by using the following configurations of density() of spatstat:

  • bw.diggle() automatic bandwidth selection method. Other recommended methods are bw.CvL(), bw.scott() or bw.ppl().

  • The smoothing kernel used is gaussian, which is the default. Other smoothing methods are: “epanechnikov”, “quartic” or “disc”.

  • The intensity estimate is corrected for edge effect bias by using method described by Jones (1993) and Diggle (2010, equation 18.9). The default is FALSE.

kde_nga_bw <- density(nga_ppp,
                      sigma=bw.diggle,
                      edge=TRUE,
                    kernel="gaussian")

kde_nga_functional_bw <- density(nga_functional_ppp,
                      sigma=bw.diggle,
                      edge=TRUE,
                    kernel="gaussian")

kde_nga_nonfunctional_bw <- density(nga_nonfunctional_ppp,
                      sigma=bw.diggle,
                      edge=TRUE,
                    kernel="gaussian")

The plot() function of Base R is then used to display the kernel density derived.

plot(kde_nga_bw)

plot(kde_nga_functional_bw)

plot(kde_nga_nonfunctional_bw)

bw <- bw.diggle(nga_ppp)
bw
   sigma 
175.2226 
bw_functional <-bw.diggle(nga_functional_ppp)
bw_functional
   sigma 
263.5313 
bw_nonfunctional <-bw.diggle(nga_nonfunctional_ppp)
bw_nonfunctional
   sigma 
296.0087 
nga_ppp.km <- rescale(nga_ppp, 100, "km")
kde_nga.bw <- density(nga_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
kde_nga_functional.bw <- density(nga_functional_ppp, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
kde_nga_nonfunctional.bw <- density(nga_nonfunctional_ppp, sigma=bw.diggle, edge=TRUE, kernel="gaussian")

plot(kde_nga.bw)

plot(kde_nga_functional.bw)

plot(kde_nga_nonfunctional.bw)

gridded_kde_nga_bw <- as.SpatialGridDataFrame.im(kde_nga.bw)
spplot(gridded_kde_nga_bw)

kde_nga.bw_rastor<- raster(gridded_kde_nga_bw)
kde_nga.bw_rastor
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 8.948485, 9.616045  (x, y)
extent     : 1765.032, 2910.438, 3314.347, 4545.201  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      : v 
values     : -6.500989e-17, 0.4183645  (min, max)
projection(kde_nga.bw_rastor) <- CRS("+init=EPSG:3414")
kde_nga.bw_rastor
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 8.948485, 9.616045  (x, y)
extent     : 1765.032, 2910.438, 3314.347, 4545.201  (xmin, xmax, ymin, ymax)
crs        : +init=EPSG:3414 
source     : memory
names      : v 
values     : -6.500989e-17, 0.4183645  (min, max)
tm_shape(kde_nga.bw_rastor) + 
  tm_raster("v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE)
7.5.1 Observations from Kernel Density Map
  1. Describe the spatial patterns revealed by the kernel density map
    • From the density map generated, we could tell that the distribution of water points are mostly distributed evenly across Nigeria, Osun state (between 0.10 to 0.15) with only a few spots that have higher intense colours (between 0.25 to 0.30).

    • There are still many areas in Nigeria, Osun state that have zero distribution of water points which means they do not have access to any water (0).

  2. Highlight the advantages of kernel density map over point map
    • Provides a continuous surface that represents the estimated density of points at each location. This allows to see the overall distribution of water points acorss Nigeria, Osun state more precisely with high or low densities. In contrast, a point map simply displays individual points, which can make it difficult to see patterns in the data.

    • Help identify hot spots or areas of high concentration, which can be useful for identifying clusters of points or areas of interest. This can be particularly useful in this assignment to identify areas with high intensity of water points in Nigeria, Osun state.

    • More visually appealing and easier to interpret than a point map, especially when dealing with large numbers of points.

In conclusion, a kernel density map can provide a more accurate representation of the spatial distribution of points and can be more useful for identifying patterns and hot spots in the data. However, a point map can still be useful for exploring the individual points and for understanding the distribution at the local scale. The choice between a kernel density map and a point map often depends on the specific needs of the analysis and the type of data being analyzed.

8. Second Spatial Point Patterns Analysis Methods

To confirm the observed spatial patterns above, a hypothesis test will be conducted. The hypothesis are test as follows:

Ho = The distribution of water points in Nigeria, Osun state are randomly distributed

H1 = The distribution of water points in Nigeria, Osun sate are not randomly distributed

Confidence level = 95%

8.1 Analysing Spatial Point Process Using G-Function

The code chunk below is used to compute G-function using Gest() of spatat package.

G_nga_functional = Gest(nga_functional_ppp, correction = "border")
plot(G_nga_functional, xlim=c(0,1900))

G_nga_nonfunctional = Gest(nga_nonfunctional_ppp, correction = "border")
plot(G_nga_nonfunctional, xlim=c(0,1900))

8.1.1 Performing Complete Spatial Randomness Test

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.

Monte Carlo test with G-function

G_nga_functional.csr <- envelope(nga_functional_ppp, Gest, nsim = 999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10.........20.........30.........40.........50.........60........
.70.........80.........90.........100.........110.........120.........130......
...140.........150.........160.........170.........180.........190.........200....
.....210.........220.........230.........240.........250.........260.........270..
.......280.........290.........300.........310.........320.........330.........340
.........350.........360.........370.........380.........390.........400........
.410.........420.........430.........440.........450.........460.........470......
...480.........490.........500.........510.........520.........530.........540....
.....550.........560.........570.........580.........590.........600.........610..
.......620.........630.........640.........650.........660.........670.........680
.........690.........700.........710.........720.........730.........740........
.750.........760.........770.........780.........790.........800.........810......
...820.........830.........840.........850.........860.........870.........880....
.....890.........900.........910.........920.........930.........940.........950..
.......960.........970.........980.........990........ 999.

Done.
G_nga_nonfunctional.csr <- envelope(nga_nonfunctional_ppp, Gest, nsim = 999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10.........20.........30.........40.........50.........60........
.70.........80.........90.........100.........110.........120.........130......
...140.........150.........160.........170.........180.........190.........200....
.....210.........220.........230.........240.........250.........260.........270..
.......280.........290.........300.........310.........320.........330.........340
.........350.........360.........370.........380.........390.........400........
.410.........420.........430.........440.........450.........460.........470......
...480.........490.........500.........510.........520.........530.........540....
.....550.........560.........570.........580.........590.........600.........610..
.......620.........630.........640.........650.........660.........670.........680
.........690.........700.........710.........720.........730.........740........
.750.........760.........770.........780.........790.........800.........810......
...820.........830.........840.........850.........860.........870.........880....
.....890.........900.........910.........920.........930.........940.........950..
.......960.........970.........980.........990........ 999.

Done.
plot(G_nga_functional.csr,xlim=c(0,1900))

plot(G_nga_nonfunctional.csr,xlim=c(0,1900))

8.1.2 Analysis of Spatial Point Pattern using G-Function

The G-function creates a graph of the “edge-corrected G-function”. The upper curve is the empirical distribution of nearest neighbor distances for the water points, adjusted for edge effects caused by a finite domain.

The lower curve shows the expected distribution for random uniform data of the same size on the same domain. The light-blue band is a 95% confidence envelope, which gives you a feeling for the variation due to random sampling.

The G line is above the envelope and spatial randomness line for both functional and non-functional G graphs therefore, the graphs indicate that there is no structure in the water point data and there is a complete spatial randomness.

In summary, I will accept my null hypothesis and reject the alternative hypothesis to conclude that the distribution of water points in Nigeria, Osun state are randomly distributed.

G-function is useful for analyzing properties of spatial point patterns. This article compared tree data to a single instance of random uniform data. By using the SPP procedure, you can run a more complete analysis and obtain graphs and related statistics with minimal effort.

8.2 Analysing Spatial Point Process Using L-Function

According to lesson 4 slides,

  • L(r) >0 indicates that the observed distribution is geographically concentrated

  • L(r) <0 implies dispersion

  • L(r) = 0 indicates complete spatial randomness(CRS)

L_nga_functional = Lest(nga_functional_ppp, correction= "Ripley")
plot(L_nga_functional, . -r ~ r, 
     ylab= "L(d)-r", xlab = "d(m)")

L_nga_nonfunctional = Lest(nga_nonfunctional_ppp, correction= "Ripley")
plot(L_nga_nonfunctional, . -r ~ r, 
     ylab= "L(d)-r", xlab = "d(m)")

L_nga_nonfunctional.csr <- envelope(nga_nonfunctional_ppp, Lest, nsim = 39, rank = 1, glocal=TRUE)
Generating 39 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,  39.

Done.
L_nga_functional.csr <- envelope(nga_functional_ppp, Lest, nsim = 39, rank = 1, glocal=TRUE)
Generating 39 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,  39.

Done.
plot(L_nga_functional.csr, . - r ~ r, xlab="d", ylab="L(d)-r")

plot(L_nga_nonfunctional.csr, . - r ~ r, xlab="d", ylab="L(d)-r")

8.2.1 Analysis of Spatial Point using L-Function

For distance more than 1000m, the L(d) - r function (black line) lies above the L(d) - r function at CRS (red line).

Therefore, I will accept my null hypothesis and reject the alternative hypothesis to conclude that the distribution of water points in Nigeria, Osun state are randomly distributed.

For both functions (G&L), I will accept null hypothese and reject the alternative hypothesis.

9. Spatial Correlation Analysis

What is Local Colocation Quotients (LCLQ):

A point event category A is colocated with point events of category B if it is surrounded by several point event category B within a specified distance.

E.g.

In this assignment, we are interested to find out if the spatial distribution of functional and non-functional water points are independent from each other.

To confirm the observed spatial correlation pattern, a hypothesis test will be conducted. The hypothesis are test as follows:

Ho = The distribution of functional and non-functional water points are independent from each other

H1 = The distribution of functional and non-functional water points are not independent from each other

Confidence level = 95%

wp_sf_clean <- wp_sf_nga %>%  filter(!status_clean=='unknown')

nb = include_self(st_knn(st_geometry(wp_sf_clean), 6)) 

wt = st_kernel_weights(nb, wp_sf_clean, "gaussian", adaptive = TRUE)
f = wp_sf_clean %>%
  filter(status_clean == "Functional")

A = f$status_clean

nf = wp_sf_clean %>%
  filter(status_clean == "Non-Functional")
B = nf$status_clean

LCLQ = local_colocation(A, B, nb, wt, 49)
LCLQ_wp = cbind(wp_sf_clean, LCLQ)

Code breakdown:

  • The code first filters the wp_sf_clean object to remove any entries with a “status_clean” of “unknown”.

  • Then, it calculates the 6 nearest neighbors for each feature in the wp_sf_clean object using the st_knn function from the sf library and creates a weight matrix for each feature with the st_kernel_weights function, using a Gaussian kernel.

  • Next, the code creates two separate objects, f and nf, which contain only the “Functional” and “Non-Functional” features from the wp_sf_clean object, respectively.

  • The local_colocation function is then applied to the two objects, using the nearest neighbor information and weight matrix calculated earlier, with a neighborhood size of 49. Finally, the output of the local_colocation function is combined with the wp_sf_clean object using cbind, creating a new object LCLQ_wp.

tmap_mode("view")
tm_shape(NGA_wp) +
  tm_polygons() +
  tm_shape(LCLQ_wp) +
  tm_dots(col = c("Non.Functional"), 
          size = 0.01,
          border.col = "black",
          border.lwd = 0.5) +
  tm_view(set.bounds = c(4,7,5,8),
          set.zoom.limits = c(8, 13))

9.1 Analysis of Spatial Correlation

The tool will determine, for each feature of the Category of Interest, whether the features of the Neighboring Category are more or less present in its neighborhood compared to the overall spatial distribution of the categories. For example, for each feature of category A, a resulting local colocation quotient (LCLQ) value of 1 means that you are as likely to have category B as a neighbor as you might expect. A LCLQ value greater than 1 means you are more likely (than random) to have B as a neighbor, and a LCLQ value less than 1 means that the feature of category A is less likely to have a category B point as your neighbor (than a random distribution).

According to the map generated, the proportion of A (functional points) within the neighborhood of B (nonfunctional points) is higher than the global porportion of A, the colocation quotient is therefore high or based on the slides explanation, 99% of the point have a value of 1 which means that it is likely to have both functional and nonfunctional points collocate together.

Therefore, I will reject the null hypothesis and accept the alternative hypothesis saying that the distribution of functional and non-functional water points are not independent from each other.

10. References & Resources Used

Here are the list of resources used in this analysis, as well as their links. Special thanks to Seniors work samples and Prof Kam for all the detailed explanations and clear documentary posted!! :))

Section 5.3.1: https://bolt.mph.ufl.edu/6050-6052/unit-1/one-quantitative-variable-introduction/understanding-outliers/

Section 7.5: https://gistbok.ucgis.org/bok-topics/kernels-and-density-estimation

Section 8.1: https://blogs.sas.com/content/iml/2016/09/19/nearest-neighbor-distances.html

Section 8.2.1: https://www.mattpeeples.net/modules/PointPattern.html

Section 9.1: https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/colocationanalysis.htm